{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\nSchedule Primitives in TVM\n==========================\n**Author**: `Ziheng Jiang <https://github.com/ZihengJiang>`_\n\nTVM is a domain specific language for efficient kernel construction.\n\nIn this tutorial, we will show you how to schedule the computation by\nvarious primitives provided by TVM.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from __future__ import absolute_import, print_function\n\nimport tvm\nfrom tvm import te\nimport numpy as np"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "There often exist several methods to compute the same result,\nhowever, different methods will result in different locality and\nperformance. So TVM asks user to provide how to execute the\ncomputation called **Schedule**.\n\nA **Schedule** is a set of transformation of computation that\ntransforms the loop of computations in the program.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# declare some variables for use later\nn = te.var(\"n\")\nm = te.var(\"m\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "A schedule can be created from a list of ops, by default the\nschedule computes tensor in a serial manner in a row-major order.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# declare a matrix element-wise multiply\nA = te.placeholder((m, n), name=\"A\")\nB = te.placeholder((m, n), name=\"B\")\nC = te.compute((m, n), lambda i, j: A[i, j] * B[i, j], name=\"C\")\n\ns = te.create_schedule([C.op])\n# lower will transform the computation from definition to the real\n# callable function. With argument `simple_mode=True`, it will\n# return you a readable C like statement, we use it here to print the\n# schedule result.\nprint(tvm.lower(s, [A, B, C], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "One schedule is composed by multiple stages, and one\n**Stage** represents schedule for one operation. We provide various\nmethods to schedule every stage.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "split\n-----\n:code:`split` can split a specified axis into two axes by\n:code:`factor`.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "A = te.placeholder((m,), name=\"A\")\nB = te.compute((m,), lambda i: A[i] * 2, name=\"B\")\n\ns = te.create_schedule(B.op)\nxo, xi = s[B].split(B.op.axis[0], factor=32)\nprint(tvm.lower(s, [A, B], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can also split a axis by :code:`nparts`, which splits the axis\ncontrary with :code:`factor`.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "A = te.placeholder((m,), name=\"A\")\nB = te.compute((m,), lambda i: A[i], name=\"B\")\n\ns = te.create_schedule(B.op)\nbx, tx = s[B].split(B.op.axis[0], nparts=32)\nprint(tvm.lower(s, [A, B], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "tile\n----\n:code:`tile` help you execute the computation tile by tile over two\naxes.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "A = te.placeholder((m, n), name=\"A\")\nB = te.compute((m, n), lambda i, j: A[i, j], name=\"B\")\n\ns = te.create_schedule(B.op)\nxo, yo, xi, yi = s[B].tile(B.op.axis[0], B.op.axis[1], x_factor=10, y_factor=5)\nprint(tvm.lower(s, [A, B], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "fuse\n----\n:code:`fuse` can fuse two consecutive axes of one computation.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "A = te.placeholder((m, n), name=\"A\")\nB = te.compute((m, n), lambda i, j: A[i, j], name=\"B\")\n\ns = te.create_schedule(B.op)\n# tile to four axes first: (i.outer, j.outer, i.inner, j.inner)\nxo, yo, xi, yi = s[B].tile(B.op.axis[0], B.op.axis[1], x_factor=10, y_factor=5)\n# then fuse (i.inner, j.inner) into one axis: (i.inner.j.inner.fused)\nfused = s[B].fuse(xi, yi)\nprint(tvm.lower(s, [A, B], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "reorder\n-------\n:code:`reorder` can reorder the axes in the specified order.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "A = te.placeholder((m, n), name=\"A\")\nB = te.compute((m, n), lambda i, j: A[i, j], name=\"B\")\n\ns = te.create_schedule(B.op)\n# tile to four axes first: (i.outer, j.outer, i.inner, j.inner)\nxo, yo, xi, yi = s[B].tile(B.op.axis[0], B.op.axis[1], x_factor=10, y_factor=5)\n# then reorder the axes: (i.inner, j.outer, i.outer, j.inner)\ns[B].reorder(xi, yo, xo, yi)\nprint(tvm.lower(s, [A, B], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "bind\n----\n:code:`bind` can bind a specified axis with a thread axis, often used\nin gpu programming.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "A = te.placeholder((n,), name=\"A\")\nB = te.compute(A.shape, lambda i: A[i] * 2, name=\"B\")\n\ns = te.create_schedule(B.op)\nbx, tx = s[B].split(B.op.axis[0], factor=64)\ns[B].bind(bx, te.thread_axis(\"blockIdx.x\"))\ns[B].bind(tx, te.thread_axis(\"threadIdx.x\"))\nprint(tvm.lower(s, [A, B], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "compute_at\n----------\nFor a schedule that consists of multiple operators, TVM will compute\ntensors at the root separately by default.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "A = te.placeholder((m,), name=\"A\")\nB = te.compute((m,), lambda i: A[i] + 1, name=\"B\")\nC = te.compute((m,), lambda i: B[i] * 2, name=\"C\")\n\ns = te.create_schedule(C.op)\nprint(tvm.lower(s, [A, B, C], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        ":code:`compute_at` can move computation of `B` into the first axis\nof computation of `C`.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "A = te.placeholder((m,), name=\"A\")\nB = te.compute((m,), lambda i: A[i] + 1, name=\"B\")\nC = te.compute((m,), lambda i: B[i] * 2, name=\"C\")\n\ns = te.create_schedule(C.op)\ns[B].compute_at(s[C], C.op.axis[0])\nprint(tvm.lower(s, [A, B, C], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "compute_inline\n--------------\n:code:`compute_inline` can mark one stage as inline, then the body of\ncomputation will be expanded and inserted at the address where the\ntensor is required.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "A = te.placeholder((m,), name=\"A\")\nB = te.compute((m,), lambda i: A[i] + 1, name=\"B\")\nC = te.compute((m,), lambda i: B[i] * 2, name=\"C\")\n\ns = te.create_schedule(C.op)\ns[B].compute_inline()\nprint(tvm.lower(s, [A, B, C], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "compute_root\n------------\n:code:`compute_root` can move computation of one stage to the root.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "A = te.placeholder((m,), name=\"A\")\nB = te.compute((m,), lambda i: A[i] + 1, name=\"B\")\nC = te.compute((m,), lambda i: B[i] * 2, name=\"C\")\n\ns = te.create_schedule(C.op)\ns[B].compute_at(s[C], C.op.axis[0])\ns[B].compute_root()\nprint(tvm.lower(s, [A, B, C], simple_mode=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Summary\n-------\nThis tutorial provides an introduction to schedule primitives in\ntvm, which permits users schedule the computation easily and\nflexibly.\n\nIn order to get a good performance kernel implementation, the\ngeneral workflow often is:\n\n- Describe your computation via series of operations.\n- Try to schedule the computation with primitives.\n- Compile and run to see the performance difference.\n- Adjust your schedule according the running result.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}